Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering

نویسندگان

  • Lilian D. A. Wanzare
  • Alessandra Zarcone
  • Stefan Thater
  • Manfred Pinkal
چکیده

We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our model exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our ap-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LSDSem 2017 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our model exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporat...

متن کامل

DeScript: A Crowdsourced Corpus for the Acquisition of High-Quality Script Knowledge

Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched wi...

متن کامل

Multiple Clustering Views from Multiple Uncertain Experts

Expert input can improve clustering performance. In today’s collaborative environment, the availability of crowdsourced multiple expert input is becoming common. Given multiple experts’ inputs, most existing approaches can only discover one clustering structure. However, data is multi-faceted by nature and can be clustered in different ways (also known as views). In an exploratory analysis prob...

متن کامل

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge

Scripts representing common sense knowledge about stereotyped sequences of events have been shown to be a valuable resource for NLP applications. We present a hierarchical Bayesian model for unsupervised learning of script knowledge from crowdsourced descriptions of human activities. Events and constraints on event ordering are induced jointly in one unified framework. We use a statistical mode...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017